Here is the data set.
The two-way ANOVA will test whether the independent variables (type of training [Yoga vs. Cardio vs. HIIT] AND Gender of training [F vs. M]) have an effect on the dependent variable (well-being - score). But there are some other possible sources of variation in the data that we want to take into account.
We are going to ask if there is an effect of type of training on participants’ well being scores
## Happy_Score Sport_type Gender
## 1 20 yoga m
## 2 75 yoga w
## 3 70 yoga w
## 4 55 yoga m
## 5 65 yoga w
## 6 35 yoga m
## 7 25 yoga w
## 8 65 yoga w
## 9 50 yoga m
## 10 55 yoga w
## 11 75 cardio m
## 12 40 cardio w
## 13 55 cardio m
## 14 77 cardio w
## 15 80 cardio m
## 16 70 cardio m
## 17 35 cardio w
## 18 50 cardio m
## 19 60 cardio w
## 20 80 cardio m
## 21 25 HIIT w
## 22 35 HIIT w
## 23 45 HIIT m
## 24 76 HIIT m
## 25 30 HIIT m
## 26 30 HIIT w
## 27 40 HIIT w
## 28 65 HIIT m
## 29 55 HIIT m
## 30 15 HIIT m
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
And here it is for the flipped plot:
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
mu_11_w = mean(dat$Happy_Score[dat$Sport_type == "yoga" & dat$Gender == "w"])
mu_11_m = mean(dat$Happy_Score[dat$Sport_type == "yoga" & dat$Gender == "m"])
mu_12_w = mean(dat$Happy_Score[dat$Sport_type == "cardio" & dat$Gender == "w"])
mu_12_m = mean(dat$Happy_Score[dat$Sport_type == "cardio" & dat$Gender == "m"])
mu_13_w = mean(dat$Happy_Score[dat$Sport_type == "HIIT" & dat$Gender == "w"])
mu_13_m = mean(dat$Happy_Score[dat$Sport_type == "HIIT" & dat$Gender == "m"])
\(\color{red}{Y_{ijk} = \mu_{jk} + \alpha_{j} + \beta_{k} + \gamma_{jk} + \epsilon_{ijk}}\)
Looks scary, right?
Lets break it down:
k = 3
j = 2
Lets start with computing the means across all levels of all populations
| Happy_Score | Sport_type | Gender |
|---|---|---|
| 20 | yoga | m |
| 75 | yoga | w |
| 70 | yoga | w |
| 55 | yoga | m |
| 65 | yoga | w |
| 35 | yoga | m |
| 25 | yoga | w |
| 65 | yoga | w |
| 50 | yoga | m |
| 55 | yoga | w |
| 75 | cardio | m |
| 40 | cardio | w |
| 55 | cardio | m |
| 77 | cardio | w |
| 80 | cardio | m |
| 70 | cardio | m |
| 35 | cardio | w |
| 50 | cardio | m |
| 60 | cardio | w |
| 80 | cardio | m |
| 25 | HIIT | w |
| 35 | HIIT | w |
| 45 | HIIT | m |
| 76 | HIIT | m |
| 30 | HIIT | m |
| 30 | HIIT | w |
| 40 | HIIT | w |
| 65 | HIIT | m |
| 55 | HIIT | m |
| 15 | HIIT | m |
| Happy_Score | Sport_type | Gender |
|---|---|---|
| 20 | yoga | m |
| 75 | yoga | w |
| 70 | yoga | w |
| 55 | yoga | m |
| 65 | yoga | w |
| 35 | yoga | m |
| 25 | yoga | w |
| 65 | yoga | w |
| 50 | yoga | m |
| 55 | yoga | w |
| 75 | cardio | m |
| 40 | cardio | w |
| 55 | cardio | m |
| 77 | cardio | w |
| 80 | cardio | m |
| 70 | cardio | m |
| 35 | cardio | w |
| 50 | cardio | m |
| 60 | cardio | w |
| 80 | cardio | m |
| 25 | HIIT | w |
| 35 | HIIT | w |
| 45 | HIIT | m |
| 76 | HIIT | m |
| 30 | HIIT | m |
| 30 | HIIT | w |
| 40 | HIIT | w |
| 65 | HIIT | m |
| 55 | HIIT | m |
| 15 | HIIT | m |
| Happy_Score | Sport_type | Gender |
|---|---|---|
| 20 | yoga | m |
| 75 | yoga | w |
| 70 | yoga | w |
| 55 | yoga | m |
| 65 | yoga | w |
| 35 | yoga | m |
| 25 | yoga | w |
| 65 | yoga | w |
| 50 | yoga | m |
| 55 | yoga | w |
| 75 | cardio | m |
| 40 | cardio | w |
| 55 | cardio | m |
| 77 | cardio | w |
| 80 | cardio | m |
| 70 | cardio | m |
| 35 | cardio | w |
| 50 | cardio | m |
| 60 | cardio | w |
| 80 | cardio | m |
| 25 | HIIT | w |
| 35 | HIIT | w |
| 45 | HIIT | m |
| 76 | HIIT | m |
| 30 | HIIT | m |
| 30 | HIIT | w |
| 40 | HIIT | w |
| 65 | HIIT | m |
| 55 | HIIT | m |
| 15 | HIIT | m |
| Happy_Score | Sport_type | Gender |
|---|---|---|
| 20 | yoga | m |
| 75 | yoga | w |
| 70 | yoga | w |
| 55 | yoga | m |
| 65 | yoga | w |
| 35 | yoga | m |
| 25 | yoga | w |
| 65 | yoga | w |
| 50 | yoga | m |
| 55 | yoga | w |
| 75 | cardio | m |
| 40 | cardio | w |
| 55 | cardio | m |
| 77 | cardio | w |
| 80 | cardio | m |
| 70 | cardio | m |
| 35 | cardio | w |
| 50 | cardio | m |
| 60 | cardio | w |
| 80 | cardio | m |
| 25 | HIIT | w |
| 35 | HIIT | w |
| 45 | HIIT | m |
| 76 | HIIT | m |
| 30 | HIIT | m |
| 30 | HIIT | w |
| 40 | HIIT | w |
| 65 | HIIT | m |
| 55 | HIIT | m |
| 15 | HIIT | m |
| Happy_Score | Sport_type | Gender |
|---|---|---|
| 20 | yoga | m |
| 75 | yoga | w |
| 70 | yoga | w |
| 55 | yoga | m |
| 65 | yoga | w |
| 35 | yoga | m |
| 25 | yoga | w |
| 65 | yoga | w |
| 50 | yoga | m |
| 55 | yoga | w |
| 75 | cardio | m |
| 40 | cardio | w |
| 55 | cardio | m |
| 77 | cardio | w |
| 80 | cardio | m |
| 70 | cardio | m |
| 35 | cardio | w |
| 50 | cardio | m |
| 60 | cardio | w |
| 80 | cardio | m |
| 25 | HIIT | w |
| 35 | HIIT | w |
| 45 | HIIT | m |
| 76 | HIIT | m |
| 30 | HIIT | m |
| 30 | HIIT | w |
| 40 | HIIT | w |
| 65 | HIIT | m |
| 55 | HIIT | m |
| 15 | HIIT | m |
\(H_{0}:\alpha_{j} = 0\) for all \(j\)
\(H_{1}:\alpha_{j} ≠ 0\) for at least one \(j\)
\(H_{0}:\beta_{k} = 0\) for all \(j\)
\(H_{1}:\beta_{k} ≠ 0\) for at least one \(j\)
\(H_{0}:\gamma_{jk} = 0\) for all combinations of \(jk\)
\(H_{1}:\gamma_{jk} ≠ 0\) for at least one combinations of \(jk\)
\[\color{red}{Y_{ijk} = \mu_{jk} + \alpha_{j} + \beta_{k} + \gamma_{jk} + \epsilon_{ijk}}\]
## Df Sum Sq Mean Sq F value Pr(>F)
## Sport_type 2 2123 1061.4 3.644 0.0415 *
## Gender 1 103 102.8 0.353 0.5581
## Sport_type:Gender 2 1895 947.6 3.253 0.0562 .
## Residuals 24 6990 291.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The effect size used for ANOVA. It easures the proportion of the total variance in a dependent variable that is associated with the membership of different groups defined by an independent variables.
Partial \(\eta^2\) is a similar measure in which the effects of other independent variables and interactions are partialled out.
If not… here it is: \(\eta^2= \frac{\sigma^2_{zw}}{\sigma^2_{tot}}\)
In the two-factor anova we extend the above-written formula to a similar one. However, this time we have to consider the new parameters: - \(\sigma^2_{factor1}\) - \(\sigma^2_{factor2}\) - \(\sigma^2_{interaction}\) - \(\sigma^2_{DV}\)
–>
The overall effectsize \(\eta^2_{tot} = \sigma^2_{factor1} + \sigma^2_{factor2} + \sigma^2_{interaction} + \sigma^2_{DV}\)
It measures the proportion of variance explained by a given variable of the total variance remaining after accounting for variance explained by other variables in the model.
library(DescTools)
EtaSq(mod_anova)
## eta.sq eta.sq.part
## Sport_type 0.191030749 0.23291852
## Gender 0.009247787 0.01448637
## Sport_type:Gender 0.170568077 0.21329045
Example of an interpretation of the results: + \(\eta^2\) is the proportion of variance the
treatment accounts for in the wellbeing of participants. In the case of
the sport type, we see that 19.1% of the total variance of wellbeing is
explained by sport participants did.
+ \(\eta_{part}^2\) is the proportion
of variance the treatment accounts for in the wellbeing of participants.
In the case of the sport type, we see that 23.3% of the variance is
explained by the sport type once the gender and the interaction effect
are “taken out”.
For two-way ANOVA, the ratio between the mean sum of squares of a specific factor and the mean of sum of squares of the residuals (i.e., the variability within) is testd.
We use the pf() to calculate the area under the curve for the interval [0,4.226] and the interval [4.226,+∞) of a F curve with with \(v1=1\) and \(v2=24\)
x = sum_anova[[1]][["F value"]][1]
df_factor1 = 1
df_inn = 24
# interval $[0,1.5]
pf(x, df1 = df_factor1, df2 = df_inn, lower.tail = TRUE)
## [1] 0.9317056
x = sum_anova[[1]][["F value"]][1]
df_factor1 = 1
df_inn = 24
pf(x, df1 = df_factor1, df2 = df_inn, lower.tail = FALSE)
## [1] 0.06829437
library(latex2exp)
x <- rf(100000, df1 = 2, df2 = 24)
hist(x,
breaks = 'Scott',
freq = FALSE,
xlim = c(0,5),
ylim = c(0,1.5),
xlab = '',
main = (TeX('Histogram for a $\\F$-distribution with $\\v_1 = 2$ and $\\v_2 = 24$ degrees of freedom (df)')), cex.main=0.9)
curve(df(x, df1 = 1, df2 = 24), from = 0, to = 5, n = 5000, col= 'red', lwd=3, add = T)
abline(v=4.227, col = "pink", lwd=2, lty='dashed')
abline(v=.321, col = "blue", lwd=2, lty='dashed')
abline(v=3.46, col = "green", lwd=2, lty='dashed')